CONCENTRATION BOUNDS FOR STOCHASTIC APPROXIMATIONS 
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Abstract. We obtain non asymptotic concentration bounds for two kinds of stochastic approxima- 
tions. We first consider the deviations between the expectation of a given function of an Euler like 
discretization scheme of some diffusion process at a fixed deterministic time and its empirical mean 
obtained by the Monte-Carlo procedure. We then give some estimates concerning the deviation be- 
. tween the value at a given time-step of a stochastic approximation algorithm and its target. Under 

0^ ' suitable assumptions both concentration bounds turn out to be Gaussian. The key tool consists in 

exploiting accurately the concentration properties of the increments of the schemes. Also, no specific 
non-degeneracy conditions are assumed. 
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1. Statement of the Problem 

Let us consider a d-dimensional stochastic evolution scheme of the form 



> 
o 

CO 

m 



= £n +~/ n+ iF(n,£ n ,Y n+1 ), n>Q,£o=x€ R d , (1.1) 



where (7„)„>i is a deterministic positive sequence of time steps, the function F : N x R d x R 9 — > R rf is a 
measurable function satisfying some assumptions that will be specified later on, and the (Yi)igisr* are i.i.d. R 9 - 
\ valued random variables defined on some probability space (fl, P) whose law satisfies a Gaussian concentration 
5— i ■ property. That is, there exists a > s.t. for every real-valued 1-Lipschitz function / defined on R 9 and for all 
A > 0: 

E[exp(A/(F 1 ))] < exp(AE[/(y!)] + ^). (GC(a)) 



From the Markov exponential inequality and (GC{a) |, one derives V(f,r) := P[/(Yi) — E[/(Yi)] > r] < 



exp(— Xr + 2 j-),VA,r > 0. An optimization over A gives that 2?(/, r) has sub-Gaussian tails bounded by 
exp(-£). 



A practical criterion for (GC(a)l to hold is given by Bolley and Villani [B]. If there exists e > s.t. 



E[exp(e|Yi| 2 )] < +oo, then the law of Y\ satisfies (GC(a) \ with a := a(e). The two claims are actually 



equivalent. In the following (GC(pt) \ is the only crucial property we require on the innovations (y,)i 6 N*. In 



particular we do not assume any absolute continuity of the law of Y\ w.r.t. the Lebesgue measure. 
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We are interested in giving non asymptotic concentration bounds for two specific problems related to evolu- 
tions of type (|1.1[) . We first want to control the deviations of the empirical mean associated to a function of an 
Eulcr like discretization scheme of a diffusion process at a fixed deterministic time from the real mean. Secondly, 
we want to derive deviation estimates between the value of a Robbins-Monro type stochastic algorithm taken 
at fixed time-step and its target. Under some mild assumptions, we show that the Gaussian concentration 
property of the innovations transfers to the scheme. Concerning stochastic algorithms, our deviation results are 
to our best knowledge the first of this nature. 

1.1. Euler like Scheme of a Diffusion Process 

Let (O, J 7 , (J-"t)t>o, P) be a filtered probability space satisfying the usual conditions and (Wt)t>o be a q- 
dimensional (J r t)t>o Brownian motion. Let us consider a d-dimensional diffusion process (X t )t>o with dynamics: 

X t = x+ f b(s,X s )ds+ [ a(s,X s )dW s , (1.2) 
Jo Jo 

where the coefficients b, a are assumed to be uniformly Lipschitz continuous in space and measurable in time. 

For a given Lipschitz continuous function / and a fixed deterministic time horizon T, quantities like E x [/(Xt)] 
appear in many applications. In mathematical finance, it represents the price of a European option with maturity 
T when the dynamics of the underlying asset is given by (jl.2p . Under suitable assumptions on the function 
/ and the coefficients b, a, namely smoothness or non degeneracy, it can also be related to the Feynman-Kac 
representation of the heat equation associated to the generator of X. Two steps are needed to approximate 
E*[/(Xr)]: 

- The first step consists in approximating the dynamics by a discretization scheme that can be simulated. For 
a given time step A = T/N, N 6 N*, setting for all i € N, ij := iA, we consider an Euler like scheme of the 
form: 

X A = x, V^ € [0, N - 1],X£ = x£ + b(U, X£)A + a(U, x£)VAY i+1 , (1.3) 



where the (Yi)igN* are Revalued i.i.d. random variables whose law satisfies (GC{a)) for some a > 0. We 
also assume E[Yi] = q , E\YiY*] — I qi where stands for the transpose of the column vector Y\ and q ,I q 
respectively stand for the zero vector of R 9 and the identity matrix of R 9 (£> R 9 . The previous assumptions 



include the case of the standard Euler scheme, corresponding to Y± = J\f(Q,I q ), which yields (GC(a)) with 



a = 2 and the Bernoulli law Y\ = (B\, • • • , B q ) 1 {Bk)k^\i,q\ i-i-d with law n = h(S— i + S\). This latter choice 
can turn out to be useful, in terms of computational effort, to approximate (|1.2[) when the dimension is large. 



The second step consists in approximating the expectation ~E x [f(X^)] involving the scheme (|1.3p by a Monte- 



Carlo estimator: 

M 



M 

3=1 

where the ((Xp u,x ) J ) j 

e[i,M] arc independent copies of the scheme (|1.3j) starting at x at time and evaluated 

at time T. 

The global error between E x [f (Xt)], the quantity to estimate, and its implemcntablc approximation E^{x, T, f) 
can be decomposed as follows: 

£(A,M,x,T,f) := (E x [/(X T )]-E :c [/(^)]) + (E ;c [/(^)]-i? A ( a; ,T,/)) 

:= S D (A,x,T,f)+8 s (A,M,x,T,f). (1.4) 



The term £ d{A, x, T, f) corresponds to the discretization error and has been widely investigated in the 
literature since the seminal work of Talay and Tubaro [16]. For the standard Euler scheme, this contribution 
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usually yields an error of order A, provided the coefficients b, a and the function / are sufficiently smooth, which 
are the assumptions required in [16 , or that b, a satisfy some non-degeneracy assumptions which allow to weaken 
the smoothness assumptions on /. This is for instance the case in Bally and Talay [I] who obtain the expected 
order for a bounded measurable / and smooth coefficients satisfying a (possibly weak) hypocllipticity condition. 
Their proof relics on Malliavin calculus. When the diffusion coefficient is uniformly elliptic and bounded, if &, a 
are also assumed to be three times continuously diffcrcntiablc, the control at order A for £d(A, x, T, /) can 
be derived from Konakov and Mammon [10) who use a more direct parametrix approach. When the Gaussian 
increments of the standard Euler scheme are replaced by more general (possibly discrete) random variables 
(Yi)i>\ having the same covariance matrix and odd moments up to order 5 as the standard Gaussian vector 
of R 9 , it can be checked that the error expansion at order A of [16] still holds for b,<r,f smooth enough. In 
that framework we also mention the works of Konakov and Mammen [5] , [3] , concerning local limit theorems 
for the difference between (|1.2[) and the scheme (|1.3[) . As in [ID] , the coefficients are supposed to be smooth 
and a uniformly elliptic. The associated error is then of order A 1 / 2 , speed of the Gaussian local limit theorem, 
sec Bhattacharya and Rao [2] • 

The term £s(A, M, x, T, /) in (jl.4[) corresponds to the statistical error. Under some usual integrability 
conditions, i.e. f(X^) £ L 2 (P), it is asymptotically controlled by the central limit theorem. A first non- 
asymptotic result is given by the Berry-Essen theorem provided f(X^) £ L 3 (P), but for practical purposes, 
the crucial quantity to control non-asymptotically is the deviation between the empirical mean E^{x, T, /) and 
the real one E x [f(X^ )]. Precisely, for a fixed M and a given threshold r > 0, one would like to give bounds on 
the quantity P[\E&(x,T,f) - E x [f(X£))\ > r). 

In the ergodic framework and for a constant diffusion coefficient Gaussian controls have been obtained by 
Malrieu and Talay j!4) . In the current context and for the standard Euler scheme, a first attempt to establish 
two-sided Gaussian bounds for £s(A, M, x, T, /) can be found in [T3] under some non-degeneracy conditions up 
to a systematic bias independent of M. 

In the current work we assume that the coefficients satisfy the mild smoothness condition: 

(A) The coefficients 6, a arc uniformly Lipschitz continuous in space uniformly in time, a is bounded. 
Note that we do not assume any non-degeneracy condition on o~ in (A). 



We next show that when the innovations satisfy (GC(a) ), the Gaussian concentration property transfers to 
the statistical error E^(x, T, /) — ~E x [f(X^)]. In particular we get rid off the systematic bias in [13]. The key 
tool consists in writing the deviation using the same kind of decompositions that are exploited in [16) for the 
analysis of the discretization error. Denote by X T ,t%,x the value at time T of the scheme (|1.3[) starting from 
x £ R d at time ti, i £ [0, NJ and by Ti := a{Yj 1 j < i) the filtration generated by the innovations. We write 

N 



f{x£> ")--E[f(x£> <')] := ^E[/(A^^)|^]-E[/(^°-)|j: i _ 1 ] 

i=i 

N 

= ^E[/(A^)|At<°<1 - E[/(A^)|A t A : °/1, 



i=l 



using the Markov property for the last equality. Introducing the function v A (ti,x) := E[/(X^)|A t A 
x], (i,x) £ {0,N] x R d , we obtain: 



jv 

/(X£'°'*)-E[/(X£'°>*)] := J2 vA ( t - X U°' X )~v A (U-uXt° 1 ' X )- (1-5) 
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The definition of v A now yields: 

N 

f(X A '°' x ) E[f(x£>°' x )] = 5> A (*,,x£' 0>a ) - E[v A (t h x£>°> x )\X^ x } 

i=l 
N 

»=i 

(1.6) 

where jf (x, y) := E[/(Jf£)|X t A = x + b(t i - 1 ,x)A + a(t i - 1 ,x)y], for all (i,x,y) £ [1, JV] x R d x R«. 

The decomposition (|1.5[) is similar to the first step of the analysis of the discretization error. In that 



framework, v A (ti,X t . ,0,x ) is replaced by v(ti,X t . ' 0,x ) = E[/(XjT t; )], that is the expectation involving the 
diffusion at time T starting from the current value of the scheme at U : see [16]. Under some non degeneracy 
assumptions or smoothness of the coefficients, v is smooth and Ito- Taylor expansions lead to the previously 
mentioned first order error for £ d(A, x, T, /). 

To analyze the statistical error, the key idea is to exploit recursively from (|1.6p that the increments of the 



scheme (|1.3[) satisfy (GC(a)\. The Gaussian concentration property will readily follow provided the fa are 
Lipschitz in the variable y. Under (A), this smoothness is actually derived from direct stability arguments 
using flow techniques, see Proposition 14. II and its proof in Section fA. II 

Let us here mention the work of Blower and Bolley [3J who obtained Gaussian concentration properties 
for the joint law of the first n positions of stochastic processes (possibly non Markov) with values in general 
separable metric spaces. This result is in some sense much stronger than ours, since it can for instance yield 
to non asymptotic controls of the Monte-Carlo error for smooth functionals of the path, such as the maximum. 
However, some continuity assumptions in Wasserstein metric are assumed on the transition measures of the 
process, see e.g. condition (ii) in their Theorems 1.2 and 2.1. This is required from the coupling techniques 
used in the proof. Checking this kind of continuity can be hard in practice, in [3J the authors give some sufficient 
conditions that require the transition laws to be absolutely continuous and smooth, see their Proposition 2.2. 



In the current work we only need the property (GC{a) ) for the innovations, which can in particular hold for 
discrete laws. 

Also, we want to stress that, even if the concentration results coincide when the innovations (li)i£N» have a 
smooth density, the nature of the proofs is different. Blower and Bolley exploit optimal transportation techniques 
whereas our approach consists in adapting the PDE arguments used for the analysis of the discretization error 
to the current setting. It is actually striking that a similar error decomposition can be used for investigating 
both the discretization and statistical error. 

We conclude mentioning some works related to the deviations of the 1-Wasserstein distance between a ref- 
erence measure and its empirical version. In the i.i.d. case, such results were first obtained for different 
concentration regimes by Bolley, Guillin, Villani [5] relying on a non-asymptotic version of Sanov's Theorem. 
Some of these results have also been derived by Boissard jl| using concentration inequalities and extended to 
crgodic Markov chains up to some contractivity assumptions in the Wasserstein metric on the transition kernel. 
In the i.i.d. case and Gaussian concentration regime, these results lead to the following type of estimates: 



P ( sup 

f,l-Lip 



1 M 

_^/(^.)-E[/(Z)] 



>r\< C(r) exp {—KMr 



2\ 



where the (Zj)j^* are i.i.d. having the same distribution as Z and the constants C(r) and K may be explicitly 
but tediously computed. This kind of uniform deviation bounds are of interest in statistics and numerical 
probability from a practical point of view. They can indeed lead to deviation bounds for the estimation of the 
density of the invariant measure of a Markov chain, see [5]. However, the (possibly large) constant C(r) is the 
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trade-off to obtain uniform deviations over all Lipscliitz functions. We do not intend to develop these aspects 
but similar bounds could be established in our context. 



1.2. Robbins-Monro Stochastic Approximation Algorithm 

Besides our considerations for the Euler scheme, we derive non asymptotic bounds for stochastic approxima- 
tion algorithms of Robbins-Monro type. These recursive algorithms aim at finding a zero of a continuous function 
h : R d — > R d which cannot be directly computed but only estimated through simulation. Such procedures are 
commonly used in a convex optimization framework since minimizing a function amounts to finding a zero of 
its gradient. Precisely the goal is to find a solution 9* to h(9) := E[i?(0, Y)] = 0, where H : R d x R« R d is 
a Borel function and Y is a given Revalued random variable. Even though h(ff) cannot be directly computed, 
it is assumed that the random variable Y can be easily simulated (at least at a reasonable cost), and also that 
H(9, y) can be easily computed for any couple (6, y) G R d x R 9 . The Robbins-Monro algorithm is the following 
recursive scheme 

n +i = n - jn+iH(e n , Y n+1 ), n > 0, 9 G R d , (1.7) 

where (Y n ) n >i is an i.i.d. Revalued sequence of random variables defined on a probability space (f2, J 7 , P) and 
(7n)n>i is a sequence of non-negative deterministic steps satisfying the usual assumption 

^7„ = +oo, and ^ 7 2 < +00. (1.8) 

n>l n>l 

When the function h is the gradient of a potential, the iterative scheme (|1.7p can be viewed as a stochastic 
gradient algorithm. Indeed, replacing H(6 n ,Y n+ i) by h(9 n ) in <\1.7\i leads to the usual deterministic gradient 
method. One of the ideas in p.7p is to take advantage of an averaging effect along the scheme due to the 
specific form of h(9) := E[H(9, Y)]. This allows to avoid the explicit computation or estimation of h. We refer 
to [7J, [H] for some general convergence results of the sequence (0 n ) n >o defined by |L7j towards its target 9* 
under the existence of a so-called Lyapunov function, i.e. a continuously differentiable function L : R d — > R + 
such that VL is Lipschitz, |VL| 2 < C(l + L) for some positive constant C and 



{VL, h) > 0. 



See also [12j for a convergence theorem under the existence of a pathwise Lyapunov function. In the sequel, it 
is assumed that 9* is the unique solution of the equation h(9) = and that (9 n ) defined by (|1.7p converges a.s. 
towards 9*. 



We assume that the innovations (i^)ieN* satisfy (GC(a)) for some a > and also that the following 



conditions on the function H and the step sequence (7n) n >i in Ql-7[1 arc in force: 
(HL) The map (9,y) £ R d x R 9 n- H(9,y) is uniformly Lipschitz continuous. 

(HUA) The map h : 9 G R rf n- ~E[H(9,Y)) is continuously differentiable in 9 and there exists A > s.t. 
V0 G R d , G R d , A|C| 2 < (Dh(8)£,0 (Uniform Attractivity) . 

In order to derive a Central Limit Theorem for the sequence (#„)„>i as described in [7] or |llj . it is commonly 
assumed that the matrix Dh(9*) is uniformly attractive. In our current framework, this local condition on the 
Jacobian matrix of h at the equilibrium is replaced by the uniform assumption (HUA). This allows to derive 
non-asymptotic concentration bounds uniformly w.r.t. the starting point (Jo- 
Note that under (HUA) and the linear growth assumption 



V6»eR d , E Y)\ 2 ~\ < C{\ + \9 -9*\ 2 ), 



(which is satisfied if (HL) holds and Y E L 2 (P)) the function L : 9 M> \ \9 — 9*\ 2 is a Lyapunov function for 
the recursive procedure defined by (jl.7|) so that one easily deduces that 6 n —> 9*, a.s. asn-> +oo. 
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As for the Euler scheme, we decompose the global error between the stochastic approximation procedure 0„ 
at a given time step n and its target 0* as follows: 

z n ~\e n -e*\ = (\e n -e*\--E[\e n -e*\}) + E[\e n -e*\] 

■= £Emp{l,n,H,\,a) + 5 n (1.9) 

where S n := E[|0 n -0*|]. 

The term £Emp{"f, n, H, A, a) corresponding to the difference between the absolute value of the error at time n 
and its mean can be viewed as an empirical error. As for the Euler scheme, the Gaussian concentration property 
transfers to this quantity under (HL) and (HUA). The strategy consists in introducing again a telescopic sum 
of conditional expectations. Denoting for all iGN, Ti := o~(Yj, j < i) (i.e. {J r i)ieN is the natural filtration of 
the algorithm), we write for all n G N*: 

n 

£ E m P (7,n,H,X,a) = \z n \ -E[|z„|] = ^E[|2„||j;] - E[|z„||.Fi_i] 

i=l 
n 

1=1 

n 

= y £f?(.8i-i,Y i )--E\f?(6 i - 1 ,Y i )\j: i - 1 ], 

1 = 1 

where we used the Markov property for the second equality and we introduced the notations Vi{9) := E[|0„ — 
0*||0i = 0], V(i,0) G [l,n] x R d , /7(0,y) = Vi(9 - jiH{9,y)). The stability of the Gaussian concentration 
property is then derived using that the f? arc Lipschitz in the variable y, see Proposition 15. II 

The term 5 n in (|1.9p corresponds to the bias of the sequence (0 n ) n >o with respect to its target 9*. This 
contribution strongly depends on the choice of the step sequence (7 n )n>i and the initial point 9q. Under (HL) 
and (HUA), we analyze this quantity in Proposition [52] 



2. Main Results 

2.1. Deviations on the Euler Scheme 
Theorem 2.1 (Concentration Bounds for the Euler scheme). Denote by X^ the value at time T of the scheme 



(11.31) associated to the diffusion (|1.2[) . Assume that the innovations (li)igN* in (| 1 . 3p satisfy (GC(a) \ for some 
a > and that the coefficients b,a satisfy (A). Let f be a real valued uniformly Lipschitz continuous function 
on R d . For all M G N* and all r > 0, one has 

P^g/CC^n - *U/(X£)]| > r] < 2exp(- ^ (r ^ ff>g) ), 
*(T, /, 6, a, q) := 4a[f]lWL exp (2([6]i + c[a] 1 (l V c[a] x ))T) , 

where q is the dimension of the underlying Brownian motion in (jl.2p and c := c(q). 

Note that in the above theorem, we do not need any non-degeneracy condition on the diffusion coefficient. 
As developed in Section fl.li sec (|1.6|) . to handle the previous quantity we rewrite f(X^) — E[/(Xp)] := 
£jI 1 //H*ti 1 ,v / AYi) - Etf/HX^VAW-i], where f*(x,y) := E\f(X*)\X£ = x + b(t i _ 1 ,x)A + 
a(ti-i,x)y], for all (i,x,y) G [U^V] x R d x R 9 - If at some point along the time-discretization the process 
has a degenerate diffusion term, we can see that the difference / I A (A t ^_ i , y/AYi) — E[/ l A (A^_ i , v / Al / i)l^ r i-i] 
will not give any additional contribution in the global deviation. 
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With respect to the previous work [T3], we got rid off the systematic bias. Anyhow, the concentration 
constants now depend on the Lipschitz constant of the function v A (0,x) := E[f(X A )\X A = x] which has order 
^(T, f, b, <t, q) 1 / 2 ■ This magnitude corresponds to the product of the Lipschitz constant of the final function / 
and the mean of the Lipschitz constant for the flow of the scheme, which gives the exponential dependence in 
time, see Proposition 14. II and its proof for details. 



Remark 2.1 (Extension to smooth functionals of the path). We point out that the previous concentration results 
could be extended to some smooth junctionals of the path such as the maximum for a scalar scheme. Indeed, 
introducing in that case the additional state variable (M^)j 6 N := ( max je[o,?] -^t^)ieN> the couple (X A , M A ) iG f^ 
is Markovian and the flow arguments of Proposition \4-l\ could be extended to the couple for Lipschitz functions 
in both variables. 

Remark 2.2 (Linear SDEs and concentration). Observe that it is the boundedness of a that gives the Gaussian 
concentration regime. However, in many popular models in finance, the diffusion coefficient is linear, see e.g. 
the Black-Scholes like dynamics X t = xq + L b(X s )X s ds + J Q cr(X s )X s dW s for smooth, bounded coefficients b, a. 
For the estimation of~E{f(X A )] of the associated Euler scheme, if f is bounded then the Gaussian concentration 
holds for the statistical error from the Bolley and Villani criterion applied to f{X A ). However, for a general 
Lipschitz function, the expected concentration is the log-normal one. 

2.2. Deviations for Robbins-Monro algorithms 

Theorem 2.2 (Concentration Bounds for Robbins-Monro algorithms). Assume that the function H of the 
recursive procedure (9 n )n>o (with starting point 9q G R^j defined by ()1.7|) satisfies (HL) and (HUA), and that 



the step sequence (j n )n>i satisfies ()1.8|) . Suppose that the law of the innovation satisfies (GC(a)), a > O.Then, 
for all N G N* and all r>0, 



>N 



"| > r + Sn) < exp 



r 2 



where Wn '■= Ilfclo 1 ~ ^Klk+i + [-Hli7 2 +i) an d Sn '■= E [\8n — 9*\]. Moreover, the bias Sn at step N satisfies 



Sn < exp(-ArAr) \9 -0*\ + [H]i<j y V e" 2 ^"- 1 ^ 1 ^ 



lk+i 



where T N :=EfcLi7/=> try := E [F 2 (F)] 1/2 < +oo, with F : y i-> E [\y - Y\]. 

Concerning the choice of the step sequence (7 n )n>i and its impact on the concentration rate and bias, wc 
obtain the following results: 

• If wc choose 7„ = ^, with c > 0. Then Sn -> 0, N — s- +oo, T N = c\og(N) + c[ + r N , d\ > and 
r N -> 0, so that H N = 0(A- 2c A). 

— If c < j^, the series Ylk=i ^i/^-k converges so that we obtain 11^ J2k=i "fl/^-k = 0(N~ 2c -). 

— If c > a comparison between the series and the integral yields IIat J2k=i ^k/^k = 0(A^ 1 ). 

Let us notice that we find the same critical level for the constant c as in the Central Limit Theorem 
for stochastic algorithms. Indeed, if c > 2 iie(\ — ~ wnere ^min denotes the eigenvalue of Dh(6*) with 
the smallest real part then we know that a Central Limit Theorem holds for (6 n )n>i (see e.g. [7]). 
However, this local condition on the Jacobian matrix of h at the equilibrium is replaced by a uniform 
assumption in our framework. This is quite natural since we want to derive non-asymptotic bounds for 
the stochastic approximation (|1.7[) . 
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Concerning the bias we have the following bound: 



6n < 



N Xc 



• If we choose j n = c > 0, | < p < 1, then S N — > 0, T N ~ t^N 1 ~ p as — > +oo and elementary 
computations show that there exists C > s.t. for all N > 1, IIjv < Ccxp(— 2Ay^^ 1 p )- Hence, for 
all e <G (0, 1 — p) we have: 



iV 



N-N p+ 



fc=l 



k=l k=N-NP+" + l 



k 2p 



< c 2 <^ CeM-2X——(N 1 -P - (N - N p+e ) 1 - p )) 
1 ~P 



(N - NP+£ + 1) 2 p 



< cU Cexp(-2Ac7V e ) + 



NP~ 



Up to a modification of e, this yields U N J2k=i ^l^k 1 = °(-^ P+e )> 6 e (0) 1 ~ Z 9 )- 
Concerning the bias, from the above control, we have the following bound: 



Ac 



A" 



5 N < exp --^N^p )\e Q -e*\ + [B\i<ty— —,K := K(c), Ve > 0. 
\ l — p J N 2 e 

— , the impact of the initial difference \8q — 9*\ is 



Since each step is bigger compared to the case 7„ = 
exponentially small. 

3. Abstract concentration properties for a general evolution scheme 

In this section we assume that (li)igN* is a sequence of i.i.d. Revalued random variables whose law p 
satisfies the Gaussian concentration property (GC(a) I for a given a > 0. 

Proposition 3.1 (Gaussian concentration for a stochastic evolution scheme). Fix N G N*. Define for all 
i G [l,Af], 2>i := fi(Xi-i,Yi) — 'E[fi(Xi—i,Yi)\J r i—i] for some T%-\-measurable random variables Xi-\ where 
the real-valued functions (/0ie[i,7v] are Lipschitz continuous in the y variable with constants ([fi]i)ieU,N} > 
uniformly in x. Let (7i)ie[i,iV] oe a given sequence of time steps. For all r > ; we have: 

N / o \ 



P[^7i^i > r] < exp 



i=l 



Proof Set V(r) :— PEt=i7*^« — r }- F° r A > to be specified later on, the Tchebychev exponential 
inequality yields: 

' TV 



N-l 



V(r) < exp(-Ar)E[cxp ^A 

< cxp(-Ar)E[cxp [a ^% \ E [exp(\ lN ^ N )\^ N ^}} 



(3.1) 
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Observe now that working with regular conditional expectations, we have 

E[exp(A7j V ^Ar)|-7 r Ar-l] 

= E„ [cxp(X lN (f N (x,Y)-^[f N ( x ,Y)]))}\ x=XN ^ 



where Y is a random variable with law [i. From (GC{a) ), we derive 

E[exp(A7;v5W)|.Fzv-i] < cxp(a([/7v]i7wA) 2 /4). 
Plugging this estimate in (|3.1[) and iterating the procedure we derive 

P(r) < exp(-Ar)exp ^^([/J^'J , 

and optimizing w.r.t A, we obtain: V(r) < exp ( „ N T „.. — r=- ). 

V "E i =i([/i]i7 I ) 2 / 

4. Euler Scheme: Proof of the Main Results 

In order to apply Proposition 13.11 from the decomposition (|1.6[) . all we need is to have a control on the 
Lipschitz modulus in the variable y of the functions ff^(x,y), uniformly in x. Under the current assumptions 
of Theorem 12. 1[ we have the following Proposition which is proved in Section IA.1I 



Proposition 4.1 (Control of the Lipschitz constants). Denote the respective Lipschitz constants ofb, a in (|1.3[) 
by [b]i, Denote the supremum of a by \o~\oo- Then for all i € the functions / A introduced after 

(jl.6|) are uniformly Lipschitz continuous in the space variable y uniformly in x and we have that there exists 
c := c(q) (dimension of the underlying Brownian motion) s.t: 

[f?]x~ sup V) ~ fpEl < 2[/] 1 [a| 00 exp ({[b], + 4^(1 V c^)} (T - U)) . 

x<£R d ,y^y' \V ~ V I 

where [f]i stands for the Lipschitz constants of the function f. 

Set 7l = 1, % = ffiX^y/AYi) - E[/ i A (X t f_ i , y/AY$\ J" ti _J, Vi G [l,iV]. Since the random variable 
\f~KYi satisfies the Gaussian concentration property (GC(Aaj), we derive from Propositions 13. II and 14.11 

P x {f(x£)-V x {f(Xf)} >r] < exp(- ^ — ) 

A aE»=i[/f]i 

r 2 

~ eXp( "4aT[/]2|a|L exp (2^ + c^l V cJ^W) } 
r 2 

" CXP ^ _ T*(T, /, b,a,q)^' 

Hence the random variable AT A satisfies (GC(f3)) for /3 := T*(T, /, 6, a, q)/[f}\. The bound of Theorem O 
now follows from a simple tensorization argument for independent random variables satisfying the Gaussian 
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concentration property. Namely, for A, r > 0, 



M 

P 1 [ i yD(( I T) , )- E 4/(^)]> 



< exp(-Ar)E 2 



M 



.3=1 



(GC(/3)) 

< cxp(— Ar) 



, T^(T,f,b,a,q)X\ 
eXp( ^ Ar + 4M } - 



An optimization in A gives the result. 



5. ROBBINS-MONRO ALGORITHM: PROOF OF THE MAIN RESULTS 

With the notations of Section [L2| in order to apply Proposition l3.1l we have to control the Lipschitz constants 
in y of the functions f?(6, y) = v t (9 - jiH(9, y)), V(i, 0, y) € [l,n] x R d x R 9 where Vi{6) := E[|0„-0*||0j = 0]. 
Under the assumptions of Theorem 12. 2[ the following control holds. 



Proposition 5.1 (Controls of the Lipschitz constants). For all i £ [l,n|, the function f2 satisfie 



[fill ■= sup 



\fno,y)-f?(0,y')\ 



< (njir 1 ) 1 ^^. 



\y - y 



where U n := JJ'Lo { l ~ 2 A7fc+i + [H}hl+i) , n>l. 



The proof is postponed to Section fA. 21 
Set % = f7(ei-i,Yi) - F,[f7(9 i _ 1 ,Y i )\ Ti-i]. Recalling that the random variables (^) J6N - satisfy \GC{a)\ , 
we obtain from Proposition 13.11 that for all r > 0: 



P(\0 N -0*\>r + 5 N )=P(\e N 
< exp — 



E[\e N -9*\]>r) 

^2 \ 



Contrary to the result concerning the Euler scheme, a bias appears in the non-asymptotic bound for the 
stochastic approximation algorithm. Consequently, it is crucial to have a control on it. At step n of the 
algorithm, it is equal to S n :~ E[|0„ — 9*\]. Under the current assumptions (HL) of Lipschitz continuity of H 
and (HUA) of uniform attractivity, we have the following proposition. 

Proposition 5.2 (Control of the bias). For all n > 1, we have 

S n < exp(-Ar„) \9 - 9*\ + [H} l0 -y ( E ^ 2A(r "" rfc+l) 7'+i ) , 

\fc=o / 



where Y n :=££ =1 7fc, ay := E [F 2 {Y)\ 1/2 < +oo, with F : y h> E [\y - Y\}. 

Proof. With the notations of Section [L~2l we define for all n > 1, AA/„ := /i(0 n _i) — H(9 n -i,Y n ) = 
~E[H(9 n -i, Y n )\ J- n -i] — H(9 n -i,Y n ). Recalling that (Fi)ieN* is a sequence of i.i.d. random variables we have 
that (AM n ) n >i is a sequence of martingale increments w.r.t. the natural filtration (F n '■= a(Yi,i < n),n > 1). 
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From the dynamics (|1.7p . write now for all n £ N, 

z n +i ■= 6 n +i — 0* = 6 n — 6* — 7„ + i {h(6 n ) — AM n+1 } 

= n - 9* - 7n+1 / d\Dh(9* + \(6 n - 6*))(6 n - 8*) + 7n+1 AM n+1 , 



where we used that h(6*) = for the last equality. Setting J„ := /„ d\Dh{0* + A(0„ - 9*)), we obtain 

Zn+l = (I - ln+lJn)z n + 7n+lAM n+ l. 

Take now the square of the L 2 -norm in the previous equality. Recalling that AM„ + i is a martingale increment, 
we derive: 

E[|z„ +1 | 2 ]-E[|(/- 7n+1 J„)z„| 2 ]+ 7 , 2 i+1 E[|AAf„ +1 | 2 ] 
< exp(-2A 7n+1 )E[|z„| 2 ] +l 2 n+1 [H]W Y . 

For the last inequality we used (exploiting assumption (HUA), uniform attractivity of the Jacobian matrix of h) 
\\I — 7„+i J„|| < exp(— A7„ + i), ||.|| standing for the matrix norm on R d ®R d , and the inequality E[|AA/„ + i| 2 ] < 
[i/] 2 CTy which follows from (HL). 

A direct induction yields for all n > 1: 

E[|z„| 2 ] < exp(-2Ar„) |z | 2 + [H}Wy (E e^^^O 



which completes the proof. 



Appendix A. Technical results 

A.l. Proof of Proposition I47T1 

The proof follows from usual stochastic analysis arguments that we now recall for the sake of completeness. 
For i — N wc directly get from the definition of f^{x, y) that < [/]i|cr|oo- 

For i £ [1, N — 1], define for y ^ y' the quantity 



D£(T,x,y,y') 



sup 



A.ti.G^^x.s,) v A,ti,Gti(x,y') 



x; 



\y - y'\ 



Vz £ R d , Gti(x,z) := b(t i - 1 ,x)A + a(U- U x)z. 



Write now: 



< 



D£(T,x,y,y') 

\Gf_ 1 (x,y)-Gf_ 1 (x,y')\ 



JV-l | v A,t i ,Gf_ 1 (x,y) A.ti.Gf.^x.y') 



\y - y 



AWi E 



x 



x. 



\y-y 



sup 



a(t k ,X t ) - a(t k ,X t ) 



E 

k—i 



\y - y'\ 



AF fe +i 
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One can easily show that for all (z,j) € R 9 x [0, NJ, X^ l " Gz 1 x,z g L 1 (P). Hence, taking the expectation 
and using the the Burkholdcr-Davis-Gundy inequality (see e.g. Chapter 7, §3 in Shiryaev |15j ) we obtain: 



E[D£(T,x,y,y')] 



JV-l 



< I jo-loo + [6]iA E 



k—i 



\K 



AM,Gti(*,y) v Ai,G?-.i(i,s') 



cE 



JV-l 

E 



k—i 



a(t kl X 



\y - y'\ 



°{tk,X t ) 



\y - y'\ 



1/2' 



JV-l 



< 



|ff| 00 + [6]iA5^E[(D^(t fcj x > y,»') 



k—i 



+c[ct]iE 
c := c(g) 



1/2' 



|y - y' 



-\Y k 



fe+il 



Observe now that 



E 



< E 



JV-l | v A,t i ,G?_ 1 {x,y) AM.Gt^x^) 



|y - y'\ 



'12 



fe+n 



JV-l 



1/2' 



1/2' 



D£(T, x, y, y') 1/2 ( A £ £>£(i fe) x, y, y')|^+i | 2 J 

JV-l 

< ryE^T, x, y, y')} + r/" 1 £ E[^(i fc , x, y, y')]E[|yi| 2 ], V ?7 e (0, 1) 



(A.l) 



which plugged into (|A. 1|) yields thanks to the Gronwall Lemma 



(l-cH„,)E[D*(r,i, s ,!,')]< M„exp ({[()], +c[<7] 1 E[|y 1 | 2 ]i i -'«r-(,)), 

" E ("^ A1 )- 



Taking ?? := (c[<t]i ^ ' a1 we obtain 



V[D£(T,x,y, y')\ < 2\a\ 0O cxp ({[&]j + 2cE[|y 1 | 2 ][a] 1 (l V c[a]i)}(T - , 
which recalling ]i < [/]i sup y ^ y / E[|D^(T, x, y, y')|] completes the proof up to a modification of c. 
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A.2. Proof of Proposition [5TT1 

From the definitions in Section [TT2l it suffices to control the difference E [| 6*^'' — Q s n '*|], that is the sensitivity 
of the algorithm w.r.t. the starting point at time % g [l,n]. Write for all j € [i, n — 1]: 

+ 7 2 +1 | J ff(^\y J+1 ) -#(<■*', r 3 - + i)| 2 

= _ 0jV|2 _ 2 7i+1 (^' i - 0°'**, h(8f l ) - h(ef' 1 )) 

+1 ] +1 \H(8 e 3 >\Y J+1 )-H(6 e ;-\Y J+1 )f, 

where we introduced the martingale increments AM^j'j = H(9 9 ' 1 , Y,+i) — h(9 9 ' 1 ) and AA^J = H{6 9 ' l , Yj + i) — 
h(6j '*), j > in the last equality. Now, using (HL) and (HUA) yields: 

i^i-fjili 8 < i^-<i 2 (i-2A 7i+1 +[^? 7 ^ 1 ) 

and, by induction on j, we easily obtain: 

n-l 

l^ 1 - €' ? <\0- e'\ 2 H i 1 2 ^+i + mhi+i) 

/n-l \ n-l 

-2 [] (l-2A 7j+1 + [H] 2 7 ] +1 ) ^^(flf-f.A^-AMj;) (A.2) 



where jj+i := 7 j+i/nl-=i (l — 2A 7 fc+i + [H]f-fl +1 ). Taking the expectation in (|A.2j) . we derive: 

°V"^ -B (1 " 2A7j+i+[if] ^ +i) - 



Now: 



< E[|6'* l ,6 ' _7iff ^ ,y ^ — 0j> e- 7< J ?(9j»') |] 

/n-"l \ ^ 

< J] (1 - 2A 7j+1 + [H] 2 7 2 +1 ) 7i [i?]i|y - y'| 



vV2 



Ilnllr 1 ) ' 7<[Hll|tf-» / | 



which completes the proof of the Proposition. 
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